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Abstract —This paper proposes a low-computational Bayesian 
algorithm for noisy sparse recovery (NSR), called BHT-BP. 
In this framework, we consider an LDPC-like measurement 
matrices which has a tree-structured property, and additive white 
Gaussian noise. BHT-BP has a joint detection-and-estimation 
structure consisting of a sparse support detector and a nonzero 
estimator. The support detector is designed under the criterion of 
the minimum detection error probability using a nonparametric 
belief propagation (nBP) and composite binary hypothesis tests. 
The nonzeros are estimated in the sense of linear MMSE, where 
the support detection result is utilized. BHT-BP has its strength in 
noise robust support detection, effectively removing quantization 
errors caused by the uniform sampling-based nBP. Therefore, 
in the NSR problems, BHT-BP has advantages over CS-BP (13l 
which is an existing nBP algorithm, being comparable to other 
recent CS solvers, in several aspects. In addition, we examine 
impact of the minimum nonzero value of sparse signals via BHT- 
BP, on the basis of the results of i27ll.li28l. t30l . Our empirical 
result shows that variation of x m i n is reflected to recovery 
performance in the form of SNR shift. 

Index Terms —Noisy sparse recovery, compressed sensing, non¬ 
parametric belief propagation, composite hypothesis testing, 
joint detection-and-estimation 

I. Introduction 

A. Background 

Robust reconstruction of sparse signals against measure¬ 
ment noise is a key problem in real-world applications of 
compressed sensing (CS) m-m. We refer to such signal 
recovery problems as noisy sparse signal recovery (NSR) 
problems. The NSR problems can be directly defined as 
an Zg-norm minimization problem 0,0. Solving the l 0 - 
norm task is very limited in practice when the system size 
(M, N) becomes large. Therefore, several alternative solvers 
have been developed to relax computational cost of the U r 
norm task, such as L-norm minimization solvers, e.g. , Dantzig 
selector (Zi-DS) 0 and Lasso 0. and greedy type algorithms, 
e.g., OMP [8j and COSAMP 0. Another popular approach 
to the computational relaxation is based on the Bayesian 
philosophy m-m. In the Bayesian framework, the / (r norm 
task is described as maximum a posteriori (MAP) estimation 
problem, and sparse solution then is sought by imposing a 
certain sparsifying prior probability density function (PDF) 
with respect to the target signal Go). 

Recently, Baysian solvers applying belief propagation 
(BP) have been introduced and caught attention as a low- 
computational approach to handle the NSR problems in a large 
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system setup ED-ED- These BP-based solvers reduce com¬ 
putational cost of the signal recovery by removing unneces¬ 
sary and duplicated computations using statistical dependency 
within the linear system. Such BP solvers are also called 
message-passing algorithms because their recovery behavior 
is well explained by passing statistical messages over a tree- 
structured graph representing the statistical dependency ED- 

For implementation of BP, two approaches have been mainly 
discussed according to message representation methods: para¬ 
metric BP (pBP) fT5l - ifTTI . lf39l , l40l where the BP-message is 
approximated to a Gaussian PDF; hence, only the mean and 
variance are used for message-passing, and nonparametric BP 
(nBP) llT3).irT4l.lfl9l-ll23) where the BP-message is represented 
by samples of the corresponding PDF. When the pBP approach 
is used, there are errors from the Gaussian approximation; 
these errors decrease as problem size (N, M) increases. If 
the nBP approach is used, there is an approximation error 
which generally depends upon the choice of message sampling 
methods. 

B. Contribution 

In this paper, a low-computational Bayesian algorithm is de¬ 
veloped based on the nBP approach. We refer to the proposed 
algorithm as Bayesian hypothesis test using nonparametric 
belief propagation (BHT-BPj^J Differently from the pBP- 
based solvers, BHT-BP can precisely infer the multimodally 
distributed BP-messages via an uniform sampling-based nBP. 
Therefore, BHT-BP can be applied to any types of sparse 
signals in the CS framework by adaptively choosing a signal 
prior PDF. In addition, the proposed algorithm uses low- 
density parity-check codes ll24l (LDPC)-like sparse measure¬ 
ment matrices as works in ED, 03, HI. Although such sparse 
matrices perform worse than the dense matrices do in terms of 
compressing capability in the CS framework, they can highly 
speed up the generation of the CS measurements El- 

Most CS algorithms to date for the NSR problems have 
been developed under the auspices of signal estimation rather 
than support detection. However, recently studies have indi¬ 
cated that the existing estimation-based algorithms, such as 
Lasso 0, lead to a potentially large gap with respect to the 
theoretical limit for the noisy support recovery l28l - l30l . Moti¬ 
vated by such theoretical investigation, the proposed BHT-BP 
takes a joint detection-and-estimation structure E],E), as 
shown in Figj3] which consists of a sparse support detector 
and a nonzero estimator. The support detector uses uniform 

1 The MATLAB code of the proposed algorithm is available at our webpage, 
https://sites.google.com/site/jwkanglO/ 
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sampling-based nBP and composite binary hypothesis tests 
to the CS measurements Z at hand for the sparse support 
finding. Given the detected support, the underdetermined CS 
problem is reduced to an overdetermined problem. Then, the 
nonzero estimator is applied under the criterion of linear 
minimum mean-square-error (LMMSE) 113311 . Then, let us state 
the detailed novel points of the proposed algorithm. In the 
CS framework considering reconstruction of a sparse signal 
X £ from noisy measurements Z £ M m , BHT-BP is 
novel in terms of 

1) Providing robust support detection against additive mea¬ 
surement noise based on the criterion of the minimum 
detection error probability, 

2) Removing MSE degradation caused by the message 
sampling of the uniform sampling-based nBP using a 
joint detection-and-estimation structure, 

3) Handling sparse signals whose minimum nonzero value 
is regulated by a parameter .x m j n > 0, proposing a signal 
prior PDF for such signals, 

4) Providing fast sparse reconstruction with recovery com¬ 
plexity 0(N log N + I\M) where K is the signal 
sparsity. 

For the support detection of BHT-BP, we use a hypothesis- 
based detector designed under the criterion of the minimum 
detection error probability lf32l . BHT-BP represents the signal 
support using a binary vector, scalarwisely applying the hy¬ 
pothesis testing to each binary element for the support finding. 
This hypothesis test is “composite” because the likelihood 
for the test is associated with the value of each scalar X, L . 
Therefore, we calculate the likelihood under the Bayesian 
paradigm; then, the likelihood for the test is a function of 
the signal prior and the marginal posterior of X,. This is 
the reason why we refer to our support detection as Bayesian 
hypothesis test (BHT) detection. BHT-BP has noise robustness, 
outperforming the conventional algorithms, such as CS-BP 
ED. in the support detection. In this BHT detection, the nBP 
part takes a role to provide the marginal posterior of X,. 
Therefore, the advantage of BHT-BP in support detection can 
be claimed when the BP convergence is achieved with the 
sampling rate, above a certain threshold. 

Typically, recovery performance of the nBP-based algo¬ 
rithms is dominated by the message sampling methods. In 
the case of CS-BP ED, its performance is corrupted by 
quantization errors because CS-BP works with the uniform 
sampling-based nBP such that the signal estimate is directly 
obtained from a sampled posterior. The joint detection-and- 
estimation structure of BHT-BP overcomes this weakpoint 
of CS-BP, improving MSE performance. The key behind the 
improvement is that the sampled posterior is only used for the 
support detection in BHT-BP. Furthermore, BHT-BP closely 
approaches to the oracle performance]^] in high SNR regime, 
if the rate 4r are sufficiently maintained for the signal sparsity 
K. Fig[l] is an illustration intended to see a motivational 
evidence of the recovery performance among the proposed 
BHT-BP, CS-BP 03| and BCS fl2l . 

2 Here, the oracle performance means the performance of the LMMSE 
estimator having the knowledge of the sparse support set of the signal X. 



Fig. 1. An illustrative recovery example of BHT-BP (the proposed), CS-BP 
iH and BCS ED in the presence of noise. The original image, known as the 
Cameraman, of size N = 128 2 , is transformed via three step discrete wavelet 
transform. For this example, we pad zeros for the coefficients having values 
below 100 in wavelet domain, and we recover these images from M/N = 0.5 
undersampled measurements. From this example, we see that the recovered 
image via BCS includes flicker noise but which is not shown in that of BHT- 
BP in the noisy setup. In the clean setup, BHT-BP more clearly recovers the 
image than those of CS-BP and BCS. 

The importance of the minimum nonzero value :t; ln j n of 
sparse signals X in the NSR problems was highlighted by 
Wainwright et al. in Il27l . ll28l and Fletcher et al. in (30), where 
they proved that the perfect support recovery is very difficult 
even with arbitrarily large signal-to-noise ratio (SNR) if x m i n 
is very small. Following these works, in the present work, 
we consider recovery of X whose minimum nonzero value is 
regulated by x m i n . In addition, we propose to use a signal 
prior including the parameter a : m in, called spike-and-dented 
slab prior, investigating how the performance varies according 
to the parameter x rmn . We empirically show in the BHT-BP 
recover}]^] that variation of x rnln is reflected to the recovery 
performance in the form of SNR shift. In addition, we support 
this statement with a success rate analysis for the BHT support 
detection under the identity measurement matrix assumption, 
i.e ., = I. 

The recovery complexity of BHT-BP is 0(N\og N + KM) 
which includes the cost O(KM) of the FMMSE estimation 
and that of the BHT support detection 0(N log N). This is 
advantageous compared to that of the /1 -norm solvers f > (A r3 ) 
J6],(7| and BCS 0(NK 2 ) lfl2l . being comparable to that 
of the recent BP-based algorithms using sparse measurement 
matrices 0(N log N): CS-BP Q3 and SuPrEM fl6| . 

C. Organization 

The remainder of the paper is organized as follows. We first 
provide basic setup for our work in Section II. In Section III, 
we discuss our solution approach to the NSR problem. Section 

IV describes a nonparametric implementation of the BHT 
support detector and its computational complexity. Section 

V provides experimental validation to show performance and 
several aspects of the proposed algorithm, compared to the 

3 To the best of our knowledge, we have not seen CS algorithms including 
^min as an input parameter. 
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other related algorithms. Finally, we conclude this paper in 
Section VI. 


II. Basic Setup 

In this section, we introduce our signal model, and a factor 
graphical model for linear systems used in this work. 


A. Signal Model 

Let Xo £ R N denote a sparse vector which is a deterministic 
realization of a random vector X. Here, we assume that the 
elements of X are i.i.d., and each X,; belongs to the support 
set with a sparsity rate q £ [0,1). To indicate the supportive 
state of X, we use a state vector S £ {0, 1} jY whose each 
element Si is Bernoulli random with the rate q as following 

of 1 , if x i ^ 0 with q 
1 \ 0, if Xi = 0 with 1 -q ' 1 ’ 

Then, the signal sparsity, K = ||S|| 0 , becomes Binomial 
random with B(k; N. q). In the present work, we consider the 
signal xo whose minimum nonzero value is regulated by a 
parameter x m i n > 0. For such signal generation, 

• We first draw a state vector s by generating N i.i.d. 
Bernoulli numbers of ([T}. 

• Then, we assign zero value to the signal scalars corre¬ 
sponding to Si = 0, i. e. , xo,i = 0. 

• For the signal scalar corresponding to Si = 1, a Gaussian 
number is drawn from Af(x; 0, a\) and assigned to the 
signal scalar Xo,i if |aro,»| > x m i n \ otherwise, the number 
is redrawn until a realization with ]a:o,i | > imin occurs. 

For such signals with x m i n , we propose to use a spike-and- 
dented slab prior which is a variant of the spike-and-slab prior 
l(36l . According to Q. the signal prior of Xi can be described 
as a two-state mixture PDF with the state Si, i.e.. 


fxi {x) = (1 - q)fxi(x\Si = 0) + qf Xi {x\Si = 1). (2) 


Then, the spike-and-dented slab prior includes the conditional 
priors as following 


fxt (a:|5i 


fxt (a:|5i 


0) = 6(x), 


1) (X 


J\f(x, 0, (Jx ), \x\ R *Tmin 

A, |j?| <C a; m i n 


(3) 


where S(x) is the Dirac delta PDF and A > 0 is a near¬ 
zero constant. Fig|2]shows an example of the spike-and-dented 
slab prior where the prior is drawn with the parameters, q = 
0.05, ax = 5, cc m in = = 10 -4 , and normalized to be 

Jx, fxi(x)dx = 1 . 

The goal of the proposed algorithm is to recover the signal 
vector xq from a noisy measurement vector 


z = €>xo + w £ R m , 


(4) 


given a fat measurement matrix T> £ {0,1, —1} j1/xJV (M < 
N), where the vector w £ is a realization of a Gaussian 
random vector W ~ A/”(w; 0, ct^I); therefore, the vector 
z £ R A1 is drawn from a mean shifted Gaussian random 
vector conditioned on X = xo, i.e., Z ~ Af{ z; i&Xo, cr^I). For 
the measurement matrix d?, we consider an LDPC-like sparse 



Fig. 2. Example of spike-and-dented slab PDF in log-scale where the prior 
is drawn with the parameters, q = 0.05, ax = 5, x m i n = \ = 10“ 4 , 

and normalized to be f x ./A, (x)dx = 1. 


Support Detector 



Fig. 3. Diagrammatic representation of the proposed algorithm (when 
N = 6, M = 4, L = 2) where the inputs of the proposed algorithm is 
the measurement z = [zi, 22 , Z 3 , 24 ] and the output is a signal estimate xo. 
The proposed algorithm first detects the signal support s = [.s i..... from 
the measurements z at hand, and then applies linear MMSE estimation to find 
the signal estimate xq given the detected support s. 


matrix which has very low matrix sparsity (typically less than 
1% matrix sparsity) and the tree-structured property l25l . l26l . 
We regulate the matrix sparsity by the fixed column weight 
L such that IE [ 110 co i umn 11 2 ] = L. This regulation enables the 
matrix $ to span the measurement space with column vectors 
having equal energy. 


B. Factor Graphical Modeling of Linear Systems 

Factor graphs effectively represent such sparse linear sys¬ 
tems in (|4]i ITBl . Let V := {1,iV} denote a set of 
variable nodes corresponding to the signal elements, Xo = 
[£ 0 , 1 , ..., To, at], and C := {1 denote a set of fac¬ 

tor nodes corresponding to the measurement elements, z = 
[zi,..., Zm\- In addition, we define a set of edges connecting 
V and C as 8 := {(j, 1 ) £ C x V | fji = 1} where tf>ji is the 
(j, *)-th element of <1>. Then, a factor graph Q = (V, C. £) fully 
describes the neighboring relation in the sparse linear system. 
For convenience, we define the neighbor set of V and C as 
N v (i) ■■= {j £ C\(j,i) £ £} and N c {j) := {* £ V| (j,i) £ 
£}, respectively. Note that the column weight of the matrix $ 
is expressed as L = |iVy(i)| in this graph model. 
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III. Solution Approach of Proposed Algorithm 
The proposed algorithm, BHT-BP, has a joint detection-and- 
estimation structure where we first detect the sparse support 
by a combination of BP and BHT, then estimating nonzeros 
in the detected support by an LMMSE estimator, as shown in 
Fig|3] In this section, we provide our solution approach to the 
support detection and the nonzero estimation under the joint 
structure. 

A. Support Detection using Bayesian Hypothesis Testing 
1) Support detection in BHT-BP: The support detection 
problems can be decomposed to a sequence of binary state 
detection problems given the marginal posterior fxi(x \Z = z) 
of each signal scalar. Our state detection problem is to choose 
one between the two hypotheses: 



Decide 7f 0 


H a :S i = Q and Hi : S, = 1, 


given the measurements z. Our methodlogy to this problem 
is related to Bayesian composite hypothesis testing [32, p. 
198]. In contrast to the simple hypothesis test where the PDFs 
under both hypothesis are perfectly specified, the composite 
hypothesis test must consider associated random variables. In 
our problem, the associated random variable is Xf Then, the 
binary state detector decides "Hi if 

fz{z\Hi) = //z(z|Hi, Xj = x)f Xi {x\Hi)dx 

/z(zl'Ho) f fz(z\H 0 ,Xi = x)fx i (x\ / H 0 )dx 7 ’ 


where 7 is a threshold for the test. The PDF f z (z\H Si , Xi = x ) 
is simplified to fz{z\Xi = x) since the hypothesis H Si and 
the measurements Z are conditionally independent given Xi. 
Therefore, finally, the binary hypothesis test in can be 
rewritten as 


^BHTBp(z) 


f f x fx\S i = l) 
J fXi O) 
r fxfx IA=o) 
J fxfx) 


f Xi (x |Z 
f Xi (x\Z 


Z )dx U x 

~ — ^ 7 

z )dx n o 


( 6 ) 


where the Bayesian rule is applied to fz{z\Xi = x) = 

/Xi(a /.Y~(x) /z(Z) ’ and obviousl y fxi(x\H Si ) = fxi(x\Si = Si) 

holds from the prior knowledge of ([ 2 ji. 

In some detection problem under Bayesian paradigm, one 
can reasonably assign prior probabilities to the hypotheses. 
In the present work, we assign the sparsity rate q to the 
hypotheses, i.e., PrjTfo} = 1 — q and Prj'Hi} = q. Then, 
we can define the state error rate (SER) of the scalar state 
detection <[ 6 | [32, p. 78] 


Pser ■= Pr{si ^ Si\H 0 }(l -q) + Pr{s"j + s^H^q. (7) 


It is well known that the threshold 7 of © can be optimized 
under the criterion of the minimum detection error probability 
with the SER expression ([Til. By the criterion, we assign the 
threshold to 7 = 7 * := ~fp L - We omit the derivation for 
this threshold optimization here, referring interested readers 
to [32, p. 90]. We call this binary hypothesis test (| 6 ) with the 
threshold 7 * as Bayesian hypothesis test (BHT) detection. The 
proposed algorithm generates a detected support s £ { 0 , 1 } N 
according to the results of a sequence of BHTs. Therefore, 
given a marginal posterior of each Xi, BHT-BP can robustly 


Fig. 4. Fig[4]illustrates the scalar state detection by BHT under an assumption 
of <b — I. Under this assumption, “the hypothesis test given a vector z " is 
simplified to “the test given a scalar zf \ described in {iij, where the threshold 
7 ' is derived from the equality condition of j9j. In the figure, the horizontal- 
lined region (blue) represents Pr{3) 7 = s , \ 'H\ [ and the vertical-lined (red) 
region does PrjSi f zfhLi) [. 


detect the signal support even when the measurements are 
noisy. 

Figj4] illustrates the scalar state detection of BHT-BP when 
the matrix is $ = I such that the measurement channel can 
be decoupled to N scalar Gaussian channels, i.e., Zj = X, + 
Wj, (1 = j). Under this assumption, “the hypothesis test given 
a vector z” can be scalarwise to “the test given a scalar zf\ 
being simplified 

fj£C:\ Zj \%i ( 8 ) 

Ho 

where the threshold 7 ' is derived from the equality condition 
with the two scalar likelihood and the threshold 7 * = 

/*>!«.) ■ ,,, 

/z,(z|H 0 ) 

Hence, the threshold 7 ' is a function of ax, &w, £ m i n , 
and q (see Appendix II). With this threshold 7 ', we can 
find the conditional SER, Pr{si Si\H Si }, for the case 
$ = I. In Fig|4] the horizontal-lined region (blue) represents 
Pr{si Si\Hi} and the vertical-lined (red) region does 

Pr{si Si\Ho}- The corresponding SER analysis will be 

provided in Appendix II. Although Fig|4]does not show typical 
behavior of the BHT detection given a vector measurement z, 
the figure helps intuitive understanding of the BHT detection. 

In addition, it is noteworthy in Fig|4] that the shape of 
fzj ( z\Hi ) is dented near Zj = 0. This is caused by the use of 
the spike-and-dented slab prior, given in Q, where the dented 
part varies with the parameter x m i n . 

2) Support detection of CS-BP: Support detection is not 
performed in practical recovery of CS-BP, but we describe it 
here for a comparison purpose. CS-BP estimates the sparse 
solution Xo directly from a BP approximation of the signal 
posterior, through MAP or MMSE estimation. Let us consider 



















5 


CS-BP using the MAP estimation. Then, given the marginal 
posterior fxi(x |Z = z), the scalar state detection of CS-BP 
is equivalent to choose one of the two peaks at x = 0 and 
x = xmap.i := arg max Jx, (x |Z = z). Namely, the binary 
state detector of CS-BP decides Pi\ if 


^CSBp( Z ) * — 


Pr{tCMAP,i - Ax < Xj < x M Apg + Ax\Z = z} 
Pr{0 — Ax < Xi < 0 + Ax\Z = z} 

fsZ’i-Ax fxMZ = z)dx 


In—A-r fxi(x\Z = z)dx 


> 1 , 


( 10 ) 


where Ax is a small quantity that we eventually let approach 
to 0. When £map,! = 0, the test cost becomes one; then, 
the detector immediately decides PLo- Hence, in CS-BP, the 
detected support s is just a by-product of the signal estimate 
Xmap, which is not robust support detection against additive 
measurement noise. 


for the BP convergence. The asymptotic condition used in 
© is not always necessary if the tree-structured property 
is guaranteed for the matrix $ because the main reason to use 
the asymptotic condition in the works of El, 138) is to make 
the system “asymptotically cycle-free”, which is equivalent to 
having an “asymptotically tree-structured” matrix <tj^] Thus, 
we claim the advantage of BHT-BP over CS-BP in support 
detection with a certain threshold of the rate fi . Given ^ 
below the threshold, the BP convergence is not achieved such 
that the likelihood ,/z ( z jL/.. s ,) is not properly calculated for the 
BHT detection. We will empirically find the threshold using 
information entropy of the approximate marginal posterior, 
fsp(x )( X \Z = z ). in Section V-A. Although we do not 
provide an analytical threshold of for the BP convergence 
in this paper, simulation results with above the empirical 
threshold are quite favorable, as shown in Section V-B and -C. 


B. Conditions for BP Convergence 

In the proposed algorithm, marginal posterior of each Xi, 
fxi(x\Z = z), for the BHT detection is computed by BP. 
It was known that BP efficiently computes such marginal 
posteriors, achieving its convergence if the conditions 
in Note 1 are satisfied B3. Given the BP convergence, 
each approximate marginal posterior converges to a PDF 
peaked at an unique value x t during the iteration. In noiseless 
setup, the unique value is exactly the true value, i.e., x, = Xo,i. 


Note 1 (Conditions for BP convergence): 

• The factor graph, which corresponds to the relation 
between X and Z, has a tree-structure. 

• Sufficiently large number of iterations l is maintained 
such that BP-messages have been propagated along every 
link of the tree, and a variable node has received messages 
from all the other variables nodes. 

Although the second condition in Note 1 is practically de¬ 
manding, it has been reported that BP provides a good 
approximation of marginal posteriors even with factor graphs 
including cycles, which is called loopy BP lf25l . lf39l . l4Ql . 

A related argument for BP was stated by Guo and Wang 
in the context of the multiuser detection problem of CDMA 
systems, where the problem is actually equivalent to solve a 
linear system (37l . l38l . In the works, Guo and Wang showed 
that the marginal posterior computed by BP is almost exact 
in a large linear system (M, N —► oo) if the factor graph 
corresponding to the matrix $ is asymptotically cycle-free and 
the sampling rate ^ is above a certain thresholcj 4 ] Namely, 
Guo and Wang showed that 


(0 


BP(X,)( X I Z = Z ) - fxM\Z = Z ) 


= 0 , ( 11 ) 


lim limsup 

Z-S-oo M,IV-too 

where /b p(x )( x \Z = z ) I s an approximate marginal posterior 
of each X, by l iterations of BP. 

According to the literature, in the linear system with the 
LDPC-like matrix T>, the sampling rate 4T is the only obstacle 


4 In f37'i, f381 . the authors considered the sampling rate f above one. 


C. LMMSE Estimation of Nonzero Values 

Given the support information by the BHT detection, the 
rest of the work is reduced to the nonzero estimation problem, 
represented as 

x 0 = E [X|S = s, Z = z], (12) 


and it can straightforwardly solved by combining the nonzero 
position by s and the nonzero values given by the LMMSE 
estimate [33, p. 364] 


x o, 






’w 


-1 




(13) 


’w 


where d>s £ {0, lj Mxif denotes a submatrix of $ that con¬ 
tains only the columns corresponding to the detected support 
s, <j\ are the variance of an nonzero scalar X ,. 

The estimate Xo from the proposed joint detection-and- 
estimation structure is not optimal. As we have seen, our 
support detector (|6| is based on the criterion of minimum 
detection error probability. Even with this detector, however, 
we cannot guarantee the estimation optimality since the 
LMMSE estimator of © is not designed from the cost 
function involving the detection part El,ST]. Nevertheless, 
worth mentioning here is that the proposed joint structure has 
advantages as given in Note 2. 


Note 2 (Claims from the joint detection-and-estimation 
structure): 

• Removing the MSE degradation caused by the uniform 
sampling-based nBP. 

• Achieving the oracle performance in the high SNR regime 
with the sufficiently high rate ji for the BP convergence. 

We will empirically validate this claim in Section V-C. 


IV. Nonparametric Implementation of BHT 
Support Detector 

This section describes a nonparametric implementation of 
the proposed support detector consisting of BP and the BHT 

5 If the graph corresponding to the matrix $ has at least one cycle, the BP 
convergence cannot be rigorously guaranteed. 
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detection. We discuss our nonparametric approach of the BP 
part first, and then explain the BHT detection part. This BHT 
support detection is summarized in Algorithm |T] 


A. Nonparametric BP using Uniform Sampling 

In the BHT support detector, the BP part provides the 
marginal posterior of X, for the hypothesis test in <]6ji. Since 
the signal x 0 is real valued, each BP-message takes the form 
of a PDF, and the BP-iteration becomes a density-message- 
passing process. To implement the density-message-passing, 
we take the nBP approach H2-E2. Many nBP algorithms 
have been proposed according to several message sampling 
methods such as discarding samples having low probability 
density l20l . adaptive sampling ED, Gibbs sampling m, 
rejection sampling (22), or importance sampling |23l . 

Our nBP approach is to use an uniform sampling for the 
message representation where we set the sampling step T s on 
the basis of the three sigma-rule (35) such that 


T s = 


2 • 3ax 


(14) 


where Nd is the number of samples to store a BP-message. 
Then, we define the uniform sampling of a density-message 

f(x) as 


Samp {f(x);T s } := f(mT s - 3cr. Y ) 

= f[m], for m G {0,1,..., N d - 1}, (15) 

where Samp{-;T s } denotes the uniform sampling function 
with the step size T s . Hence, the sampled message f[m\ can 
be treated as a vector with size N d by omitting the index 
m. The main strength of the uniform sampling-based nBP is 
adaptivity to various signal prior PDFs. In addition, we note 
that the calculation of uniformly sampled messages can be 
accelerated using the Fast fourier transform (FFT). 

Consider the factor graph Q = (V, C. £) depicted in the 
support detection part of Fig J3] where a signal element X, 
corresponds to a variable node * G V and a measurement 
element Z t corresponds to a factor node j G C. At every 
iteration, messages are first passed from each variable node 
i G V to its neighboring factor nodes 7Vy(i); each factor 
nodes j G C then calculates messages to pass back to the 
neighboring variable nodes Nc(j) based on the previously 
received messages. These factor-to-variable (FtV) messages 
include extrinsic information of A’, , and will then be employed 
for the computation of updated variable-to-factor (VtF) mes¬ 
sages in the next iteration. (For the detail, see the paper (181 ). 

Let a(E G [0,1)^ denote a sampled VtF message at the 
Z-th iteration in the vector form, given as 


Al) 




» = ? ? 


PXi x 


n 


Ai- 1) 


k&N v (i)\{j} 


V(j,i)e£, (16) 


where all product operations are elementwise, the vector 
PA'; € [0,1)^ denotes the sampled signal prior, i.e ., p y, := 
Samp{/xi (x), T s }, and rj[-] is a normalization function to 


make E a(E = 1- The sampled FtV message at the Z-th 
iteration, G [0,1) ' Vrf , is defined as 


h< jL = Pz,|x ® I 0 afU V(j,t)G^ (17) 
\keN c (j)\{i } J 

where <g> is the operator for the linear convolution of vectors, 
and the vector p z | X G [0. 1) Nd is the sampled measurement 
PDF, i.e., Pz j |x := Samp{A/"(^; ($X)j-, a^), T s }. 

The convolution operations in ( fl7] > can be efficiently com¬ 
puted by using FFT. Accordingly, we can rewrite the FtV 
message calculation as 


b^,; = T - 1 \ Tp Z] \x x ( II ) \ ( 18 ) 

{ \keNc(j)\{i} ) ) 

where T denotes the FFT operation. Therefore, for efficient 
use of FFT, the sampling step T s should be appropriately 
chosen such that N d is power of two. In fact, the use of FFT 
brings a small calculation gap since the FFT-based calculation 
performs a circular convolution. However, this gap can be 
ignored, especially when the messages take the form of bell¬ 
shaped PDFs such as Gaussian PDFs. 

The sampled approximation of the marginal posterior of 
each Xi, i.e., p® |z := Samp{f$ (XA (x\Z = z),T s }, is 
produced by using the FtV message ( fl7j ) for every i G V. 
Namely, 


(0 

Pa,|z =V 


Pa, x 


n i 

keN v (i) 


Ai- 1) 


Vi G V, 


(19) 


To terminate the BP loop, we test the condition at every 
iteration, which is given as 



where e > 0 is a constant for the termination condition. If 
the condition given in ( f20) is satisfied, the BP loop will be 
terminated. After the BP termination, we can simply express 
the marginal posterior of Xi by dropping out the iteration 
index Z, i.e., p x, \ z• 


B. BHT Detection using Sampled Marginal Posterior 

We perform the hypothesis test in © by scaling it in 
logarithm. Using the sampled marginal posterior obtained 
from the BP part, an nonparametric implementation of the 
hypothesis test in © is given as 


log 


E r i x Pa,|z h P , 

^ log 

2>o x Pa-,|z « 0 


1 -g 

q 


( 21 ) 


where x is elementwise multiplication of vectors, and r 0 , i - ! G 
R ;V '' are reference vectors from the signal prior knowledge, 
defined as 


PA,|S;=0 PAi|Si=l 

r 0 := -, n := -. 

Pa 4 Pa, 


( 22 ) 


This BHT-based detector is only compatible with the nBP 
approach because the BHT detection requires full information 
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TABLE I 

List of algorithms in the performance validation 


Algorithms 

Complexity 

Type of $ 

Type of Prior PDFs 

Utilized Techniques 

BHT-BP (Proposed) 

0(N log N + KM) 

LDPC-like 

Spike-and-dented slab 

nBP 

CS-BP 03 

0(N log N) 

LDPC-like 

Spike-and-dented slab 

nBP 

SuPrEM 03 

0(N log N) 

LDF 

Two-layer Gaussian with Jeffery 

EM, pBP 

BCS d 

0(NK 2 ) 

LDPC-like 

Two-layer Gaussian with Gamma 

EM 

Zi-DS 13 

fl(N 3 ) 

Std. Gaussian 

- 

CVX opt. 


Algorithm 1 BHT support detection 

Inputs: Noisy measurements z, measurement matrix <t>, spar¬ 
sity rate q, sampled prior PDF px, sampled measurement 
PDF PZj |x> The number of samples N d , Termination con¬ 
dition £. 

Outputs: Reconstructed signal xq. Detected support vector s. 


1) Belief propagation: 

set b { '~^ = 1 for all ( j , i) G £ 


j-h 

N 

while ± £ 

V(j, i) G £■■ 


■ 

RX ; | Z 


P.v, I z 


I 

■ X; |Z | 


>edo 


Vi G V: 

set p 

end while 


w .= 

V 

PXi X 

n 

b ( ^ 1} 



keN v (i)\{j} 

= 

Pzjx ® 

<g> 

a (0 




\keN c {j)\{i} 

- 

x,\z 

V 

|" a (/) 

x b (i_1) l 
x D j *-H 



2) BHT detection: 

Vi G V: 


if log 


H r l XPXj |Z 

Hi r 0 xp x .|z 


> log T-2 then set Si = 1 


else set = 0 
end if 


on the multimodally distributed posterior of X,; which cannot 
be provided through the pBP approach. 

C. Computational Complexity 

In our uniform sampling-based nBP, the density-messages 
are vectors with size N d . Therefore, the decoder requires 
0(LN d ) flops to calculate a VtF message aand 
0( N ^ d log N d ) flops for a FtV message b^ i per iteration. 
In addition, the cost of the FFT-based convolution given 
in © spends 0(N d log N d ) flop if we assume the row 
weight is NL/M in average sense. Hence, the per-iteration 
cost of the uniform sampling-based nBP is 0(NLN d + 
M^^\ogN d ) « 0(NLN d \ogN d ) flops. For the BHT 
detection, the decoder requires 0(N d ) flops to generate the 
likelihood ratio of which is much smaller than that of 


the BP part. Therefore, the cost for the BHT detection can be 
ignored. 

For the linear MMSE estimation to find nonzeros on the 
support, the cost can be reduced upto O(KM) flops by apply¬ 
ing QR decomposition l34l . Thus, the total complexity of the 
proposed algorithm is 0(1* x NLN d logN d + KM) flops 
and it is further simplified to 0(1* x N+KM ) since L and N d 
are fixed constants. In addition, it is known that the message¬ 
passing process is applied recursively until messages have 
been propagated along with every edge in the tree-structured 
graph, and every signal element has received messages from all 
of its neighborhood, which requires l* = 0(\ogN) iterations 
03,(25],@2). Therefore, we finally obtain 0(N log N+KM) 
for the complexity of the proposed algorithm, BHT-BP. 


V. Performance Validation 

We validate performance of the proposed algorithm, BHT- 
BP, with extensive experimental results. Four types of experi¬ 
mental results are discussed in this section, as given below: 

1) Threshold (^)* for BP convergence, 

2) Support detection performance over SNR, 

3) MSE comparison to recent algorithms over SNR, 

4) Empirical calibration of BHT-BP over N d and L. 

The support detection performance is evaluated in terms of the 
success rate of perfect support detection, defined as 

Psucc := Pr{s = s|Z = z}, (23) 


and the MSE comparison to the other algorithms is performed 
in terms of normalized MSE, given as 


MSE := 


Po -x 0 ||3 

WMl 


(24) 


We generate all the experimental results by averaging the 
measures, given in ([23) and ( |24) , with respect to the signal 
Xo and the additive noise w using Monte Carlo method*] In 
addition, we define a SNR measure used in the experiment as 


SNR 


101 og 10 


Ma 2 w 


(dB) . 


(25) 


For a comparison purpose, in this validation, we include 
several recent Bayesian algorithms, CS-BP ©, BCS Q2. 
and SuPrEM [Qh], as well as an Zi-norm based algorithm, 


6 At every Monte Carlo trial, we realize xo and w to produce a measurement 
vector z given the matrix 





























(a) M/N=0.25, q=0.05 (b) M/N=0.375, q=0.05 (c) M/N=0.5, q=0.05 
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Fig. 6. Experimental result for the success rate of support detection over SNR for a variety of Xmin where we consider the case of N = 1024, L = 5, and 
&x = 5. In Fig[6] we plot the success rate of of BHT-BP (proposed) and CS-BP 1131 together with the analytic bound for the case of = I, where the 
nBP part of the both algorithms is implemented with N^ = 256, £ = 10 —5 , A = 10 —4 . 



Fig. 5. The entropy phase transition curve over the sampling rate for 
a variety of the signal length N and the sparsity rate q where we set the 
threshold (^)* to the point achieving ^2iLi h(Xi\Z = z) < 10 -3 , 
which is given in Table |TT1 These curves are information entropy of the ap- 
proximate marginal posterior, |Z — z )> drawn with the parameters 

tTx = 5, Xmin = ax /4, N^ = 256, e = 10 5 , A = 10 —4 and a noiseless 
setup. In addition, we set the column weight of the matrix 4? to L = 4 for 
N = 512,1024, and L = 5 for N = 4096, in this experiment. 


l\ -DS J60 We provide brief introduction to the Bayesian 
algorithms in Appendix I for interested readers. In this val¬ 
idation, BHT-BP and CS-BP use the spike-and-dented slab 
prior, given in 0 by applying the uniform sampling, i.e., 
p\- ; := Sampj/jQ (x), T s }. Worth mentioning here is that 
the nBP-based solvers, such as BHT-BP and CS-BP, are only 
compatible with such an unusual signal prior, like the spike- 
and-dented slab prior, which is one main advantage of the 
nBP solvers. For the measurement matrix <1\ we basically 


7 The source codes of those algorithms are obtained from each author’s 
webpage. For CS-BP, we implemented it by applying the uniform sampling- 
based nBP introduced in Section IV-A. 


consider a LDPC-like matrix in BHT-BP, CS-BP and BCS. 
In case of SuPrEM, a LDF matrix is used for the measure¬ 
ment generatiorf] and Zi-DS is performed with the standard 
Gaussian matrix as a benchmark of the CS recovery. For fair 
compariosn, all types of the matrices <t> are equalized to have 


the same column energy, i.e. E ||0 co iumn| 


= l a therefore. 


each entry 0 :p of the standard Gaussian matrix is drawn from 
A^;0,^). Table § summarizes all the algorithms included 
in this performance validation. 


A. Threshold for BP Convergence, 

We claimed the advantage of BHT-BP over CS-BP in 
support detection with the rate above a certain threshold 
(^)* in Section III-B. Given the rate ^ > (^)*, a BP 
approximation of the marginal posterior p( X )(x\Z = z) 
contains sufficiently less uncertainty on the true value xo,i- We 
empirically find the threshold ( ^ ) in a noiseless setup using 
the average information entropy, -E h(Xi\Z = z) which 
measures uncertainty of /bp(x )( x |Z = z )- The empirical 
entropy curves in Fig|5j show sharp phase transition as 3 
increases. From the result, we set the threshold to the point 
achieving jfYl^i h ( X i\ z = z ) < 10 3 , which is given 
in Table [II] for a variety of the signal length N and the 
sparsity rate q. We also note from Fig |5] that the entropy phase 
transition becomes sharper as N increases. 


B. Support Detection Performance over SNR 

Figj6] depicts an experimental comparison of the success 
rate, defined in ( |23j ), between BHT-BP and CS-BP over SNR 
for a variety of x m i n . According to the threshold (jf)* given 
in Table [II] the BP convergence is achieved only for the cases 

8 SuPrEM is only compatible with the LDF matrix which was autonomously 
proposed in the work ESI- 
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TABLE II 

Empirical Threshold ( M/N)* for the BP convergence 


Sparsity rate 

N = 512 

N = 1024 

N = 4096 

q = 0.05 

0.325 

0.25 

0.25 

q = 0.1 

0.575 

0.50 

0.475 


of (a),(b),(c),(f) in Figj 6 ] Therefore, we confine our discussion 
here to such cases, claiming the advantage of BHT-BP over 
CS-BP in support detection. 

1) SNR gain by BHT support detection : The empirical 
results of Fig[ 6 ] validate our claim that BHT-BP has more 
robust support detection ability against noise, than CS-BP 
Indeed, Fig J 6 ] shows that BHT-BP enjoys a remarkable SNR 
gain from CS-BP in the low SNR regime. This SNR gain 
is from difference of the detection criterion as discussed in 
Section III-A. As SNR increases, the success rate of the both 
algorithms gradually approach to one. In the high SNR regime, 
BHT-BP and CS-BP do not have notable difference in the 
performance. 

We support the advantage of BHT-BP over CS-BP with 
Figj7] This figure depicts an exemplary marginal posterior, 
obtained from the BP part, according to two different SNR 
levels, SNR=10 and 30dB, where the true value of X, is 
Xo t i = —4.0; hence Sj = 1. 

• When SNR is sufficiently high such as the SNR=30 dB 
case, both of the algorithms can successfully detect the 
state Si from the posterior since the probability mass is 
concentrated on the true value £ 0 ,i- 

• When SNR is low such as the SNR=10 dB case, however, 
CS-BP may result in misdetection because the point-mass 
at x = 0 is higher than the point-mass at Xo,i = —4.0 
due to the additive noise, leading to Sj = 0. In contrast, 
the BHT detector decides the state S t by incorporating 
all the spread mass due the noise. This is based on that 
the likelihoods fz(z\H s .), which construct the hypothesis 
test of 0 . is associated with the entire range of the .x'-axis 
rather than a specific point-mass. Therefore, BHT-BP can 
generate s) = 1 and success in the detection even when 
SNR is low. 

2) Analytic Bound of BHT detection when <I> = I: Fig| 6 ] 
includes an analytic bound of the BHT detection for the case 
that the measurement matrix is an identity matrix, i.e. 4» — I, 
such that there is no performance degradation from lack of 
measurements. Therefore, this bound provides a performance 
benchmark of the BHT detection when > (j^-) , because 
exact marginal posteriors are given to the BHT detector under 
the assumption of $ = I. We refer to Appendix II for 
the detailed derivation of the analytic bound. This derivation 
reveals that the bound is a function of q, x lnin , and SNR. In 
Fig J 6 ] it is clearly shown that the empirical points are fit into 
the analytic bounds as 4 f increases. 

3) Support detection with x rnln : Fig( 6 ] also shows the 
support detection behavior according to x mm , confirming that 
x m i n is a key parameter in the NSR problem. From Figj 6 ] 
we have the observation as given in Note 3. 

Note 3 (Empirical observations for x tnill ): 



Fig. 7. Example of the approximate marginal posterior fgp^ x ^(cc|Z = z), 
obtained from the nBP part, for two different SNRs: 10 dB and 30 dB, where 
the true value of Xi is xoy = —4.0, and the other parameters are set to 
M/N = 0.5, q = 0.05, ax = 5, l* = 30, and the minimum value is 

Xrnin — / 4. 


• All the success rate curve shift toward high SNR region 
as x m i n decreases. 

• Extremely, when x m in = 0, the experimental points stay 
near zero even with ^ > (y) and high SNR. 

These empirical observations intuitively tells us that contribu¬ 
tion of x m i n is as significant as SNR in the NSR problem, 
implicating that we need SNR —> oo for the perfect support 
recovery if the signal has x m in -A 0. Note that our interpre¬ 
tation on the result here shows good agreement with not only 
our analytic bound under the assumption of $ = I, but also 
the information-theoretical results ED, ED, ED showing that 
support recovery is arbitrarily difficult by sending x m in — > 0 
even as SNR becomes arbitrarily large. 


C. MSE Comparison to Recent Algorithms over SNR 

In Fig(8]and Fig|9} we provide an MSE comparison among 
the algorithms listed in Table [I] and the support-aware oracle 
estimator over SNR for a variety of where MSE* 

denotes the performance of the support-aware oracle estimator, 
given as 


Tr 


MSE* := 


(A 




-l 


EIIXII 


(26) 


In this section, we discuss the comparison result by cate¬ 
gorizing the setup of into two cases: the “region of 

% > atK * the “region of ^ cases, according 

to the empirical threshold (^)* given in Table [nj where we 
fix the parameters N = 1024, L = 5, ax = 5, x m i n = ax /4. 

1) MSE performance in region of •' With 

Fig|8] we argue that in the region of jt > () , BHT-BP 
catches up with the oracle performance, MSE*, beyond the 
SNR point allowing the accurate support finding. Fig|8]-(b) 
and -(c) validate our claim by showing that the BHT-BP curve 
coincides very closely with the MSE* curve beyond a certain 
SNR point. Worth mentioning here is that the SNR point. 
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SNR <dB> 




LU 
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(d) M/N=0.5, q=0.1 
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— SuPrEM 
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Fig. 8 . MSE comparison among the algorithms (BHT-BP (Proposed), CS- 
BP Q3, SuPrEM El, BCS 1121 . / 1 -DS 0) over SNR where we consider 
signal recovery with ^ ) . We simulate the MSE performance under 

N = 1024, L = 5, rrx = 5, i m j„ = ax /4. The nBP part embedded in 
BHT-BP (proposed) and CS-BP is implemented with N,[ = 128 and e = 
10" 5 , A = 1CT 4 . 


which starts to achieve the oracle MSE*, nearly corresponds 
to the point which attains the perfect support detection with 
P succ ~ 1.0 in Fig[ 6 ] For the cases of Fig[ 8 ]-(a) and -(d), the 
BHT-BP curve does not fit to the oracle MSE* at the high SNR 
region. The reason is coming from lack of measurements for 
the BP convergence. Indeed, it is observed from Fig[5]that the 
entropy points corresponding to (=V,g) of Fig| 8 j(a) and -(d) 
is in not a steady region but a transient region. This means 
that the corresponding posterior includes residual uncertainty 
on Xi. Although this residual uncertainty does not remarkably 
work in the low SNR region due to noise effect, it is gradually 
exposed as SNR increases, degrading the MSE performance 
in the high SNR region. 

In Fig( 8 j the CS-BP curve forms an error floor as SNR 
increases, leading to a MSE gap from BHT-BP in the high 
SNR regime. This MSE gap is mainly caused by the quan¬ 
tization error of the nBP. Since CS-BP obtains its estimate 
directly from the sampled posteriors, the quantization error is 
unavoidable, leading to an error floor. The level of the floor 
can be approximately predicted by the MSE degradation of 
the quantization, given as 

E ||<5r s [Xg] — Xsllj TH 12 _ 3 

KIM! Nj (Z/) 

where Qt s [•] is the quantization function with the step size 
T s given in (fl4|), and Xs is a random vector on the signal 


fa) M/N=0.25, q=0.1 



10 20 30 40 50 

SNR <dB> 


(b) M/N=0.375, q=0.1 



SNR <dB> 


Fig. 9. MSE comparison among the algorithms (BHT-BP (Proposed), CS- 
BP Oil, SuPrEM Q2), BCS H3, Zi-DS ( 6 )) over SNR where we consider 
signal recovery with “sf < ( 7 ^ j • We simulate the MSE performance under 
N = 1024, L - 5, ax — 5, = crx/4. The nBP part embedded in 

BHT-BP (proposed) and CS-BP is implemented with = 128 and e = 
10“ 5 ,A = 10“ 4 . 


support S. Under our joint detection-and-estimation structure, 
the LMMSE estimator m enables BHT-BP to go beyond the 
error floor. 

For SuPrEM, the performance is poor to the other algo¬ 
rithms in the experimental results of Fig| 8 ] But, it is not 
surprising since SuPrEM is basically for signals having fixed 
signal sparsity /ij^] Indeed, the SuPrEM algorithm requires the 
sparsity K as an input parameter. However, in many cases, 
the signal sparsity K is unknown and random. In our basic 
setup, recall that we assumed signals having Binomial random 
sparsity, i.e., K ~ B(k: N. q). Therefore, naturally SuPrEM 
underperforms the other algorithms in this experiment. l\ -DS 
and BCS are comparable to BHT-BP, but l \ -DS has a certain 
SNR loss from the BHT-BP over all range of SNR, and BCS 
shows an error floor at high SNR region. 

2) MSE performance in region °/ 77 < (y) .'In 

Fig[9] we investigate the MSE comparison in the region of 
W < (x)*- Under the setup of 77 = 0.25 ,q = 0.1, every 
algorithm generally does not work as shown in Fig[9j.(a). From 
the setup of = 0.375, q = 0.1, all the algorithms begin to 
find signals but, BHT-BP underperforms BCS, Ll-DS, and 
SuPrEM in this setup, as shown in Fig|9]-(b). The reason is 
that in the region of 77 < (77) , the BP does not converge 
properly due lack of the measurements such that probability 
mass on the true value Xo,i is not dominant in the approximate 
marginal posteriors, as we discussed in Section III-B. From 
the results, we conclude that BHT-BP is not advantageous 
over the other algorithms excluding CS-BP when sufficient 
measurements is not maintained for the signal sparsity. 


D. Empirical Calibration of BHT-BP over N,/ and L 

1) Number of samples Nd for nBP : From the discussion in 
Section IV-C, one can argue that the complexity of BHT-BP is 
highly sensitive to the number of samples Nd', therefore, BHT- 
BP cannot be low-computational in a certain case. It is true, but 

9 We empirically confirmed that when K is fixed, SuPrEM works as 
comparable to BHT-BP even though we does not include that result in this 
paper. 
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N d 


— □ BHT-BP (proposed) 

. Equ. (27) 

O CS-BP 

- MSE* (Oracle) 


Fig. 10. MSE comparison of BHT-BP (proposed) and CS-BP fl3l with clean 
measurements (SNR=50dB) over N d £ {8,16, 32, 64,128, 256, 512,1024} 
for a variety of where we plot the MSE curves together with the MSE 

degradation by the quantization error, given in (27). In this experiment, we 
consider a case of N = 1024, L = 5,:r m i n = c>x/4, e = 10 —5 ,A = 
10 -4 . 

we claim that the effect of N d is limited in the BHT-BP recov¬ 
ery. To support our claim, Fig|T0] compares MSE performance 
of BHT-BP, CS-BP, and the support-aware oracle estimator 
as a function of N d in a clean setup (SNR=50 dB) where we 
plot the MSE curves together with the MSE degradation by the 
quantization error, given in ( |27| . From Fig|T0} we confirm that 
BHT-BP can achieve the oracle performance if N d is beyond a 
certain level and belongs to the success phase, whereas 

CS-BP cannot provide the oracle performance even as N d 
increases. Consequently, N d does not significantly contribute 
to the MSE of the BHT-BP recovery once N d exceeds a certain 
level. This result implies that the complexity of the BHT- 
BP recovery can be steady with a constant N d in practice. 
Therefore, BHT-BP can holds the low-computational property 
given by the BP philosophy. In addition, we confirm from 
FigJTO] that the MSE of CS-BP is bounded by ((27). 

2) Column weight L of LDPC-like matrices : Another 
interesting question is how to determine the column weight 
L of the LDPC-like matrix $ for BHT-BP. Fig[TT] provides 
an answer for this question by showing the MSE of the 
BHT-BP recovery as a function of L, where we consider 
the recovery from clean measurements (SNR= 50 dB). When 
ft is sufficiently large, for example =0.75 as shown in 
FigfTT|-(b), the BHT-BP recovery generally becomes accurate 
as L increases. Then, the accuracy is almost constant after 
a certain point L = L*. On the other hands, when ^ is 
not sufficient, for example ^ = 0.5 as shown in Fig[n} 
(a), the recovery accuracy rather can be degraded beyond a 
certain point L*. The reason is that when V is small, the 
large L spoils the tree-structured property of the matrix <1\ 
reducing the accuracy of the marginal posterior approximation 



Fig. 11. MSE performance of the BHT-BP recovery over the column 
weight L of the measurement matrix where we set N = 1024, N d = 
128, Xmin = gx /4, £ = 10~ 5 , A = 10 -4 and consider the recovery from 
clean measurements (SNR= 50dB). 

by BP (251,1261. Therefore, L should keep as small as possible 
once the desirable recovery accuracy is achieved. In the case 
of FigJTl] we empirically set L* =6,5 for ^ = 0.5,0.75 
respectively. 

From the calibration shown in Fig [10] and Fig |TT] we support 
our claim that the computational cost of BHT-BP can be 
0(N log N + KM) in practice by fixing L and N d , as we 
discussed in Section IV-C. 

VI. Conclusion 

The theoretical and empirical research in this paper 
demonstrated that BHT-BP is powerful as not only a low- 
computational solver, but also a noise-robust solver. In BHT- 
BP, we employed a joint detection-and-estimation structure 
consisting of the BHT support detection and the LMMSE 
estimation for the nonzeros on the signal support. We have 
shown that the BHT-BP detects the signal support based on 
a sequence of binary hypothesis tests, which is related to 
the criterion of the minimum detection error probability. This 
support detection approach brings SNR gain of BHT-BP from 
CS-BP ED, which is an existing nBP-based algorithm, for 
the support detection. In addition, we noted the fact that 
BHT-BP effectively removes the quantization error of the nBP 
approach in the signal recovery. We have claimed that our joint 
detection-and-estimation strategy prevents from degrading the 
MSE by the quantization error. We have supported the claim 
based on an empirical result that the performance of BHT-BP 
achieves the oracle performance when sufficient measurements 
is maintained for the signal sparsity. Furthermore, we confirm 
the impact of x rnrn on the noisy sparse recovery (NSR) 
problem via BHT-BP. Based on the empirical evidence, we 
showed that exact sparse recovery with small x m i n is very 
demanding unless sufficiently large SNR is provided, which is 
an agreement with the result of Il27l . ll28l . ll30l that emphasizes 
the importance of x m i n in the NSR problem. 

Appendix I 

Brief Introduction to 
Recent Bayesian Algorithms 

In this appendix, we provide a brief introduction to some 
previously proposed Bayesian algorithms for the NSR prob- 
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lem: BCS 02), CS-BP Q3], SuPrEM OH. These algorithms 
have been developed by applying several types of signal prior 
PDFs and statistical techniques. These algorithms are included 
for simulation based comparison in Section V. 

A. BCS Algorithm 

Ji et al. proposed a Bayesian algorithm based on the sparse 
Bayesian learning (SBL) framework, called BCS fT2l . In the 
SBL framework, a two-layer hierarchical Gaussian model has 
been invoked for signal estimation. Namely, the signal prior 
PDF takes the form of 

tV «oo 

,/x(x|a, b) = TT / Af{x i ;0,'y- 1 )f r ('y i \a 1 b)d'y i , (28) 

i=i J o 

where /r(7i|a, b) is a hyper-prior following the Gamma dis¬ 
tribution with its parameters a, b. Then, the MAP estimate x 0 
of the signal can be analytically expressed as a function of the 
hyperparameter T = [ 7 - 1 ..... y N ]. the measurement matrix <t\ 
and the noisy measurements z. 

In BCS, the hyperparameter T is estimated by performing a 
type-II maximum likelihood (ML) procedure CD- Specifically, 
the type-II ML finds the hyperparameter T maximizing the 
evidence PDF, i.e., / Y (y|r) = / / Y (y|X = x)/ x (x|r) dx. 
The expectation maximization (EM) algorithm can be an 
efficient approach for the type-II ML procedure. The strategy 
of EM is to derive a lower bound on the log evidence PDF, 
log/ Y (y|r), at the E-step, and optimize that lower bound to 
find r at the M-step. The E-step and M-step are iterated until 
the lower bound becomes tighter. 

The BCS algorithm is input parameter-free, which means 
this algorithm is adaptive to any types of signals and noise 
level since BCS properly catches the hyperparameter 7 and the 
noise variance cr^ during the recovery. In addition, the BCS 
algorithm is well compatible with any type of the measurement 
matrices. 

B. CS-BP Algorithm 

Baron et al. for the first time proposed the use of BP to 
the sparse recovery problem with LDPC-like measurement 
matrices HD- The algorithm is called CS-BP. Signal model of 
CS-BP is a compressible signal which has a small number of 
large elements and a large number of near-zero elements. The 
authors associated this signal model with a two-state mixture 
Gaussian prior, given as 

N 

/x( x ) = n [qN{xi-,0,cr 2 Xi ) + (1 - q)Af{x l ; 0,ct|- o )], (29) 

i=l 

where q £ [ 0 , 1 ) denotes the probability that an element has 
the large value, and ax, &x 0 - Therefore, the prior is fully 
parameterized with a x u ■ ax ,, and q. CS-BP performs MAP or 
MMSE estimation using marginal posteriors obtained from BP 
similarly to the proposed algorithm, where the authors applied 
both nBP and pBP approaches for the BP implementation. The 
recovery performance is not very good when measurement 
noise is severe since the CS-BP was basically designed to 
work under noiseless setup. 


C. SuPrEM Algorithm 

Most recently, Akcakaya et al. proposed SuPrEM under 
a framework similar to BCS which uses the two-layer hier¬ 
archical Gaussian model for the signal prior. SuPrEM was 
developed under the use of a specific type of hyper-prior 
called the leffreys’ prior = 1 / f3i,/3i £ [T),oo] Vi £ V. 

This hyper-prior reduces the number of input parameters while 
sparsifying the signal. The overall signal prior PDF is given 
as 

tV /> 00 

/X(x) = n (30) 

i= i Jo 

SuPrEM utilizes the EM algorithm to find each hyperparam¬ 
eter /3i like the BCS algorithm. However, differently from 
BCS that calculates the signal estimate xo using matrix opera¬ 
tions which include matrix inversion, SuPrEM elementwisely 
calculates the signal estimate from /% via a pBP algorithm. 
Therefore, SuPrEM can be more computationally efficient than 
BCS. 

The measurement matrix used in SuPrEM is restricted to an 
LDPC-like matrix which has fixed column and row weights, 
called low-density-frames (LDF). They are reminiscent of the 
regular LDPC codes (24]. In addition, the signal model is 
confined to K'-sparse signals consisting of K nonzeros and 
N — K zeros since SuPrEM includes a sparsifying step which 
chooses the K largest elements at each end of iteration. The 
noise variance a^y is an optional input to the algorithm. 
Naturally, if the noise variance is provided, SuPrEM will 
produce an improved recovery performance. 

Appendix II 

Success Rate Analysis of the BHT Detection 
when $ = I 

Under the assumption of $ = I, the measurement channel 
can be decoupled to N scalar Gaussian channels which are 
Zj = Xi + Wj \/i,j £ V where clearly i = j holds. 
Accordingly, the success rate, given in (23], can be represented 
as the product of the complementary probability of the state 
error rate (SER) given in 0, i.e., P sacc = (1 - P ser ) n . 
Then, the problem is reduced to the analysis of the rate Pser 
(see Fig J4]). The conditional SER given the hypothesis H Si is 
calculated from the likelihood PDF f z \z\H Si ) as following: 

Psm\n.. ^ Si \U Si } = [_ f Zj {z\n Si )dz, (31) 

where we define the decision regions with a threshold 7 ' as 

D-Ho : = IN < i) and : = {M > i}, (32) 

and Du,, = D'H 1 vice versa. The likelihood PDFs can be 
obtained from 

fz j (z\U Si ) = j f Zj {z\Xi = x)f Xi (x\n Si )dx (33) 

as we have done in 0, where f z . ( z\Xi = x) = J\f(z; x, cr^) 
under the scalar Gaussian channel. Then, the likelihood given 
Ho simply becomes f Zj (z\Ho) = Af(z;0,ayy). In contrast. 


13 


the likelihood conditioning Hi is not straightforward due to 
the dented slab part of our prior in ([T}, which is given by 

fzj(z\Hi)oc j N{z-,x,al v )M(x-,0,(j' 1 x )dx 

^^min 

+ A / Af(z-,x,aw)dx 

|x| < s2)min 

= -V(~; 0, + a\) (34) 

x (‘- M v?) - M“)) 

+ 5 (" f (Svi) + " f (^)) 

where normalization is required to satisfy f fz . (z\Hi)dz = 1, 
and the functions A(z),B(z) are respectively described as 



In this problem, an analytical expression of j' is unattainable 
from the equality condition 0 since the PDF fzj(z\Hi) 
involves the error function terms as shown in ( |34| . Therefore, 
we utilize a root-finding algorithm to compute j'. We use the 
SNR definition given in d25l) such that SNR = 101og 10 qL ° x 
under the assumption of 3> = I. We specify the decision 
regions ( f32] > with 7', finalizing this analysis by computing the 
condition SERs, which are given as 

-fsER|-H 0 = 1 - erf (— S] , (35) 

\fjvvv 4/ 

PsEm, = 2 r fzMUtidz (36) 

Jo 

where the calculation of Pser|«i requires a numerical integra¬ 
tion owing to the error function terms in fz :j {z\H 1 ). Using 0 . 
m and ( [36] ), we can evaluate the SER, then obtaining the 
success rate of the BHT detection when $ = I. We compare 
this analysis result to the empirical results in Section V-B. 
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